CCHMC Bilingual Data Science Meeting
7/13/23
BUG & RUG present
July 13th, 2023
Join the RUG Outlook group for updates and events. {width=180%}
Why?
Examples
| R | Python | Examples |
|---|---|---|
| Single-element vector | Scalar | 1, 1L, TRUE, "foo" |
| Multi-element vector | List | c(1.0, 2.0, 3.0), c(1L, 2L, 3L) |
| List of multiple types | Tuple | list(1L, TRUE, "foo") |
| Named list | Dict | list(a = 1L, b = 2.0), dict(x = x_data) |
| Matrix/Array | NumPy ndarray | matrix(c(1,2,3,4), nrow = 2, ncol = 2) |
| Data Frame | Pandas DataFrame | data.frame(x = c(1,2,3), y = c("a", "b", "c")) |
| Function | Python function | function(x) x + 1 |
| NULL, TRUE, FALSE | None, True, False | NULL, TRUE, FALSE |
reticulate
reticulate is the R interface to python
https://rstudio.github.io/reticulate/
By default, {reticulate} uses first available non-system Python executable:
Alternatively, create or specify Python versions in virtual or Conda environments
virtualenv: r-parcel
python: /Users/broeg1/.virtualenvs/r-reticulate/bin/python
libpython: /Users/broeg1/.pyenv/versions/3.9.12/lib/libpython3.9.dylib
pythonhome: /Users/broeg1/.virtualenvs/r-reticulate:/Users/broeg1/.virtualenvs/r-reticulate
version: 3.9.12 (main, May 11 2023, 16:29:21) [Clang 14.0.3 (clang-1403.0.22.14.1)]
numpy: /Users/broeg1/.virtualenvs/r-reticulate/lib/python3.9/site-packages/numpy
numpy_version: 1.25.1
os: /Users/broeg1/.pyenv/versions/3.9.12/lib/python3.9
python versions found:
/Users/broeg1/.virtualenvs/r-reticulate/bin/python
/opt/homebrew/bin/python3
/usr/bin/python3
/Users/broeg1/.virtualenvs/r-parcel/bin/python
Simple install with py_install() will, by defult, be stored within a virtualenv or conda environment named r-reticulate
Create an environment, install packages within it, and then call from R:
(Can also be managed with usual python tools.)
usaddress🇺🇸 a python library for parsing unstructured United States address strings into address components
>>> import usaddress
>>> usaddress.tag('123 Main St. Suite 100 Chicago, IL')
(OrderedDict([
('AddressNumber', '123'),
('StreetName', 'Main'),
('StreetNamePostType', 'St.'),
('OccupancyType', 'Suite'),
('OccupancyIdentifier', '100'),
('PlaceName', 'Chicago'),
('StateName', 'IL')]),
'Street Address')Uses a probabilistic parser trained on real, parsed addresses to return tagged address parts for each address type; e.g.,
Call the usaddress module from R by importing it:
Call functions (and other data) within Python modules (and classes) via the $ operator: (This means code completion and inline help are built in!)
https://github.com/geomarker-io/parcel
Followed best practices suggestions from {reticulate} package authors.
https://github.com/geomarker-io/parcel/tree/main#installation
skip_if_no_usaddress <- function() {
have_usaddress <- reticulate::py_module_available("usaddress")
if (!have_usaddress) {
skip("usaddress python module not available for testing")
}
}
test_that("tag_address works", {
skip_if_no_usaddress()
tag_address("3333 Burnet Ave Cincinnati OH 45219") |>
expect_identical(
tibble::tibble(
street_number = "3333",
street_name = "burnet ave",
city = "cincinnati",
state = "oh",
zip_code = "45219"
)
)
})From parcel: